INTERFACE  MESSAGE  PROCESSORS  FOT.  THE 
ARPA  COMPUTER  NETWORK 


Frank  E.  Heart 

Bolt  Beranek  and  Newman,  Incorporated 


Prepared  for: 

Advanced  Research  Projects  Agency 
April  1973 


DISTRIBUTED  BY: 

KHT 

National  Technical  Information  Service 
U.  S.  DEPARTMENT  OF  COMMERCE 

5285  Port  Royal  Road,  Springfield  Va.  22151 


THIS  DOCUMENT  IS  BEST 
QUALITY  AVAILABLE.  THE  COPY 
FURNISHED  TO  DTIC  CONTAINED 
A  SIGNIFICANT  NUMBER  OF 
PAGES  WHICH  DO  NOT 
REPRODUCE  LEGIBLY. 


BOLT 


BERANEK  AND 


N  c  W  t<x  A  N 


I  N  C 


CONSUl’INb  •  0  E  V  E  l  O  E  M  E  N  T  •  IESEAICH 


Report  No.  2541 


April  1973 


CQ  INTERFACE  MESSAGE  PROCESSORS  FOR 

Qq  THE  ARPA  COMPUTER  NETWORK 

O 

O 

0  QUARTERLY  TECHNICAL  REPORT  NO.  1 
1  January  1973  to  31  March  1973 

Q 

<2 

Principal  Investigator:  Mr.  Frank  E.  Heart 

Telephone  (617)  491-1850,  Ext.  470 


Sponsored  by 

Advanced  Research  Projects  Agency 
ARPA  Order  No.  2351 


Contract  No.  F08606-73-C -0027 
Effective  Date:  1  January  19/3 

Expiration  Date:  31  December  1973 
Contract  Amount:  $2,147,071 


Title  of  Work:  IMP  Program 


Submitted  to: 

‘I  t  , 


IMP  Program  Manager 
Range  Measurements  Lab. 
Building  981 
Patrick  Air  Force  Base 
Cocoa  Beach,  Florida 


national  technical 

INFORMATION  SERVICE 

I  '  i >(  n"-  1  *  p' :  r 


32925 


O  C 

r  '  gei 

I;,  may  is  rag) 


Uib 


b 


G15WJ 

B  ^ 


CAMBRIDGE 


Nt  E  W  YOM 


CHICAGO 


10  S  ANGELES 


SAN  M  A  N  C  I  '  C  3 


Report  No.  2541 


Bolt  Beranek  and  Newman  Inc. 


INTERFACE  MESSAGE  PROCESSORS  FOR 
THE  ARPA  COMPUTER  NETWORK 


QUARTERLY  TECHNICAL  REPORT  NO.  1 
1  January  1973  to  31  March  1973 


Submitted  to: 

IMP  Program  Manager 
Range  Measurements  Lab. 
Building  981 
Patrick  A^r  Force  Base 
Cocoa  Beach,  Florida  32925 


This  research  was  supported  by  the  Advanced  Research  Projects 
Agency  of  the  Department  of  Defense  under  Contract  No.  F0°*O6 
73-C -0027 . 


Report  No.  2541 


Bolt  Reranek  and  Newman  Inc. 


TABI.E  OF  CONTENTS 

Page 

1.  OVERVIEW .  1 

1.1  IMP/TIP  Memory  Retrofit  Program  .  3 

1.2  The  TIP's  Network  Status  Facility  .  5 

1.3  Network  Traffic  .  7 

2.  NETWORK  RELIABILITY  .  11 

2.1  IMP  Program  Changes . * . 1? 

2.2  TIP  Program  Changes . 16 

3.  HSMIMP  DEVELOPMENT  .  19 


Report  No.  2541 


Bolt  Beranek  and  Newman  Inc. 


1.  OVERVIEW 

This  Quarterly  Technical  Report,  Number  1,  describes  aspects 
of  our  work  on  the  ARPA  Computer  Network  during  ;he  first  quarter 
of  1973. 

During  this  quarter  we  installed  one  316  IMP,  at  Lawrence 
Berkeley  Laboratory,  and  one  TIP,  at  the  Range  Measurements  Labo¬ 
ratory.  Also,  the  316  IMP  which  had  previously  been  in  operation 
at  Tinker  Air  Force  Base  was  removed  from  the  network  during  the 
quarter.  At  the  end  of  the  quarter  the  network  included  35  opera¬ 
tional  IMPs  and  TIPs  plus  an  experimental  TIP  at  BBN.  The  first 
quarter  saw  the  installation  of  the  first  ’’Very  Distant”  Host, 
between  Speech  Communications  Research  Laboratory,  Inc.  and  the 
UCSB  IMP. 

A  major  activity  during  the  first  quarter  has  been  making 
IMPs  less  sensitive  to  hardware  failures  in  themselves  or  their 
neighbors.  The  motivation  for,  and  conclusions  of,  this  effort 
are  described  in  Section  2. 

V/ork  on  the  High  Speed  Modular  IMP  (HSMIMP)  continued  through 
the  first  quarter.  Section  3  provides  a  survey  and  progress  re¬ 
port  on  the  HSMIMP  development. 

V.'e  completed  the  implementation  of  a  Network  Control  Program, 
and  som^  TELNET  caoabi lit ies ,  for  our  PDP-1  during  the  first 
quarter.  At  one  time  we  intended  to  make  Network  Control  Center 
(NCC)  traffic  summary  data  available  to  the  network  community 
via  TELNET  on  the  PDP-1,  but  after  a  recent  expression  of  ARPA 
oreference  have  decided  to  store  this  data  on  a  TENEX  machine 
instead . 
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We  continue  to  study  the  problems  of  subnetwork  routing. 
During  the  past  quarter  these  studies  have  concentrated  on  two 
topics:  possible  improvements  in  the  speed  and  efficiency  of 

routing  propagation,  and  area  routing  On  the  former  topic,  we 
have  been  particularly  interested  in  developing  routing  propaga¬ 
tion  methods  which  allow  for  a  variety  of  line  speeds  and  allow 
for  bursts  of  traffic  while  at  all  times  providing  unambiguous 
routing  information  even  in  the  face  of  severe  network  transients. 
On  the  latter  topic  we  have  been  particularly  interested  in  de¬ 
veloping  techniques  which  allow  hierarchies  of  areas  while  mini¬ 
mizing  the  possibility  a  node  can  be  cut  off  from  other  nodes; 
this  seems  to  require  dynamic  configuration  of  areas.  Addition¬ 
ally,  we  considered  routing  over  a  broadcast  channel  and  routing 
based  on  available  capacity  rather  than  delay  in  view  of  our  new 
ideas  for  area  routing  and  propagation  of  routing. 

During  the  first  quarter  ^ur  satellite  communications  effort 
centered  around  three  areas:  study  of  the  long-term  placement  of 
Satellite  IMPs;  continued  simulation  and  analysis  of  the  Reserva- 
tion-ALQHA  and  other  related  satellite  protocols;  and  study,  anal.y 
sis,  and  simulation  of  algorithms  for  slotting  a  broadcast  channel 

T’°e  SIMP  placement  study  resulted  in  the  recommendation  that 
SIMPs  be  placed  in  the  satellite  ground  stations.  Continued 
simulation  and  analysis  of  Reservat ion-ALOHA  led  to  better  under¬ 
standing  of  its  weaknesses  (a  tendency  toward  instability  is  one) 
and  its  strengths  (it  is  generally  comparable  to  the  Interleaved 
Reservations  system  of  Roberts,  with  both  achieving  almost  com¬ 
plete  channel  utilization  in  **he  face  of  reasonable  delays).  The 
analysis  and  simulation  of  slotting  algorithms  has  led  to  the 
conclusion  that  slotting  is  relatively  easy  for  a  50  Kos  channel 
and  possible  for  a  megabit  rate  channel. 
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We  continue  to  be  heavily  involved  in  network  Host  protocol 
development.  During  the  past  quarter  we  helped  to  organise  two 
protocol  workshops*,  one  of  these  specified  a  major  revision  of 
the  TELNET  Protocol  and  the  other  refined  the  File  Transfer 
Protocol.  In  addition  to  our  technical  contributions  at  these 
workshops,  BBN  is  producing  the  documentation  of  the  new  protocols. 
We  have  also  been  involved  in  an  exchange  of  ideas  about  Host 
protocol  with  the  International  Network  Working  Group. 

Two  papers  describing  aspects  of  our  work  on  the  network 
project  were  presented  during  the  first  quarter.  One,  entitled 
A  System  for  Broadcast  Communication:  Reservation- ALOHA ,  was 
presented  at  the  1 Q 7 3  Hawaii  System  Science  Conference.  The 
second,  Terminal  Access  to  the  ARPA  Network:  Experience  and 
Improvements ,  which  describes  directions  of  TIP  development,  was 
presented  at  COMPCON  73  (the  seventh  annual  IEEE  Computer  Society 
Internal ional  Conference) . 

1.1  IMP/TIP  Memory  Retrofit  Program 
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Unfortunately,  the  retrofit  program  has  had  .some  negative  ef¬ 
fects  on  network  reliability,  especially  during  February  and  March. 
First,  of  course,  is  the  fact  that  for  many  machines  the  retrofit 
has  removed  the  machine  from  use  for  several  hours,  or  in  a  few 
cases,  more  than  one  day.  This  always  breaks  one  network  path, 
and  usually  results  in  a  singly-connected  "stub”  of  one  or  more 
other  sites.  At  such  times  the  network  is  unusually  vulnerable  to 
single  IMP  or  line  failures  elsewhere.  We  can,  however,  frequently 
replace  a  "missing”  machine  by  simply  wiring  the  output  signal  from 
one  modem  to  the  input  of  a  second  modem  through  "patch  cords"  at 
the  customer  side  of  the  modems.  Since  each  modem  is  running  a 
data  clock,  and  the  clocks  are  independent,  this  approach  causes 
bits  to  occasionally  be  dropped  or  picked  as  the  clocks  drift  or 
jitter  relative  to  each  other.  (In  fact,  in  the  two  clocks  were 
running  at  opposite  ends  of  their  allowed  . requency  range,  bits 
would  be  dropped/picked  at  a  very  rapid  rate.)  Because  of  this, 
the  "composite"  line  looks  much  more  noisy  to  the  IMPs  at  each  end 
than  either  f  the  -orrunon-carrler '  s  circuits  (typically  a  lj%  to  10$ 
error*  rate).  Nevertheless,  this  approach  does  allow  us  to  maintain 
network  connectivity  Juring  extended  IMP  outages,  since  the  normal 
IMP- IMP  checksum  and  retransmits!.  >r.  scheme  Insures  message  integ¬ 
rity  through  the  noise. 


A  m u  o  h  ;no  re  s  e  r  i  o  u  s 
r. ev:  m e : n o r y  n ^  i ules  h o u r v 
lation  and  testing.  I* 
the  beginning  of  the  ret 


r rollon  has  been  the  sporadic  failure  of 
,  lays,  r  even  weeks  after  their  ir.stal- 
Md  r  *  appear  economically  feasible,  at 
r  fl‘  ft  r*  ,  to  set  up  a  memory  module 


"burn  in"  facility  at  b 
subjected  to  the  sam 
control  as  is  use,,  f  >r 
straight  f o r wa r d  mac h I n e 
are  attributable  to  ret 
more  difficult  problc  s 
memory ;  the  re su 1 1 s  o f 


i*N;  *v  is  th°  retrofit  components  are  not 
yr  *  f  r re- 1  ns t  a  1  la t ion  testing  under  our 
entire  ma-hlnes.  Aside  from  relatively 
failure:-  (i.e.,  IMP  solidly  dov;n)  which 
rofit  activity,  we  have  experienced  a  few 
involving  the  less  cf  only  a  few  bits  of 
these  are  discussed  In  Section  2. 
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1.2  The  TIP's  Network  Status  Facility 

As  previously  mentioned  in  our  Quarterly  Technical  Report 
No.  1;,#  a  Til  command  has  been  available  for  some  time  which 
autom  tically  connects  the  user  to  a  Network  Status  Facility 
(formerly  NEWS).  The  Network  Status  Facility  resided  in  the 
TENEX  system  at  BBN  and,  at  the  beginning  of  the  first  quarter, 
provided  three  types  of  service: 

•  A  "Netnews"  system  which  allowed  the  TIP  system 
programmers  to  communicate  to  users. 

•  A  "  Irlpe"  system  which  allowed  users  to  communicate 
to  the  programmers . 

•  A  "Host  Status"  system  which  reported  which  service 
Hosts  in  the  network  were  up  and  available. 

During  the  past  quarter  we  have  been  working  with  the  TENEX 
group  at  B3N  in  their  development  of  a  system  railed  the  Resource 
Sharing  Executive  ( R SEX EC )  as  this  system  certains  to  the  goal  of 
offering  imoroved  service  to  TIP  users.  Currently,  the  RSEXEC  is 
supported  only  on  TENEX  computers,  although  exoansion  to  other 
PD P- 1 0  * s  is  expected  soon,  and  expansion  to  other  tvpes  of  machines 
may  come  eventually. 

The  overall  design  objective  of  the  RSEXEC  is  to  allow  a  user 
to  view  all  of  the  participating  machines  as  a  single  resource; 
the  cooperation  among  the  machines  requires  to  permit  this  view 
should  be  accomplished  through  the  ARPA  Network  in  ways  which  are 
invisible  to  the  user.  This  concept  can  be  extended  to  support 
a  TIP  user  by  providing  him  with  a  (TIP)  command  which  causes 
connection  with  the  "nearest"  (e.g.,  the  quickest  to  respond) 
machine  which  supports  the  RSEXEC.  Once  connected  to  this  system, 

#See  footnote,  page  9 . 
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the  user  can  be  offered  the  full  processing  power  provided  by  the 
big  system  as  though  that  processing  were  being  provided  directly 
from  the  TIP.  In  fact,  if  the  TIP  were  to  automatically  connect 
all  users  to  the  RSEXEC,  the  naive  user  would  be  able  to  view  its 
command  language  as  the  TIP's  command  language. 

By  the  end  of  the  first  quarter  the  Network  Status  Facility 
had  been  moved  from  its  somewhat  constrained  support  by  BBN's 
TENEX  to  the  more  general  support  of  the  RSEXEC.  The  three  pre- 
viously-available  systems  are  supported  as  well  as  the  following 
new  additions: 

•  Describe  -  A  mechanism  which  provides  descriptions  of 
the  other  available  commands. 

•  Link  -  A  mechanism  to  allow  on-line  communication 
among  two  or  more  terminal  users  connected  to  RSEXEC. 

•  Sndmsg  (send  message)  -  A  general  purpose  "mail” 
distribution  facility. 

•  Trmlnf  (terminal  finder)  -  A  mechanism  which  tells 

a  TIP  user  which  TIP  Multi-line  Controller  port  he  is 
connected  to  (this  is  especially  helpful  to  TIP  users 
connected  via  dial-in  facilities). 

•  A  variety  of  facilities  for  text  editing  (e.g.,  ’’delete 
character”  and  ’’delete  line”)  and  terminal  control 
(e.g.,  ’’full  duplex”  and  ’’set  attention  character”). 

We  believe  that  continued  expansion  of  the  cooperation  between 
TIPs  and  the  RSEXEC  will  allow  the  TIP  to  concentrate  on  the 
terminal-handling  functions  which  it  must  provide,  while  at  the 
same  time  providing  the  computer  features  and  facilities  which 
users  find  convenient  and  pleasant.  This  seems  directly  in  line 
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with  the  original  goal  for  TIPs*  namely  to  provide  easy  and  in¬ 
expensive  access  for  terminal  users  to  extensive  computational 
power. 

1.3  Network  Traffic 

We  have  ueen  collecting  and  summarizing  network  traffic  at 
the  Network  Control  Center  (NCC)  for  somewhat  over  18  months, 
and  it  seems  appropriate  to  present  some  of  the  data  at  this  time. 
The  NCC  has  been  primarily  concerned  with  two  kinds  of  traffic 
data:  first,  the  total  number  of  packets  sent  ^nto  the  network 

by  each  Host  each  month,  and  second,  the  percentage  of  total  band¬ 
width  used  on  each  communication  circuit.  The  Host  packet  output 
is  further  subdivided  into  internode  traffic  (source  and  destina¬ 
tion  at  different  IMPs/TIPs)  and  intranode  traffic  (source  and 
destination  at  the  same  IMP/TIP) . 

Figure  1  shows  an  18  month  plot  of  the  average  daily  inter¬ 
node  Host  traffic.  The  average,  of  course,  includes  weekends 
and  holidays  as  well  as  business  days.  The  data  points  fit  quite 
well  to  a  straight  line  (on  a  log  scale)  drawn  between  the  first 
and  last  data  points;  thus  the  average  daily  network  traffic 
appears  to  be  increasing  at  a  steady  exponential  rate.  The 
traffic  is  increasing  by  an  order  of  magnitude  approximately 
ev"-ry  13  months,  or  doubling  approximately  every  four  months, 
(during  this  18-month  period  the  number  of  nodes  has  merely 
doubled,  from  18  to  36.)  During  the  month  of  March  1973,  the 
network  carried  almost  two  million  packets  per  day;  if  the  average 
packet  length  is  assumed  to  be  250  bits  this  amounts  to  about  .5 
billion  bits  per  day.  It  is  also  interesting  to  note  that  almost 
all  messages  are  single-packet-  messages. 
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There  are  four  data  points  which  do  not  lie  on  the  curve  but 
can  be  easily  explained.  During  the  last  half  of  April  and  all  of 
May  and  June  1972,  the  U.3.  Air  Force  was  conduct iniu  network  ex¬ 
periments  involving  large  data  transfers  between  Tinker  and 
McClellan  Air  Force  Bases.  These  experiments  were  terminated  on 
the  last  day  of  June.  In  October  1972  a  TIP  was  temporarily  in¬ 
stalled  at  the  International  Conference  on  Computer  Communication, 
as  reported  in  our  Quarterly  Technical  Report  Mo.  16.*  During  the 
three  days  of  this  conference,  and  the  three  preceding  dys,  this 
TIP  was  observed  to  generate  over  SO, 000  messages  per  hour,  with 
a  corresponding  traffic  load  at  the  Hosts  from  which  it  u  is  ob¬ 
taining  service. 


It  is  important  for  the  MCC  to  observe  the  communication 
circuit  bandwidth  used  as  a  predictor  of  saturation  of  particular 
circuits.  There  has,  of  course,  been  an  increase  in  bandwidth 
used  which  corresponds  to  the  Increase  in  internode  traff’  ,  *'ut 
saturation  does  not  appear  to  be  imminent.  The  useful  bandwidth 
of  a  circuit  is  a  function  of  overall  circuit  bandwi  1th,  IMP-IMP 
overhead,  and  packet  length.  For  the  50  Kb  circuits  which  com¬ 
prise  almost  all  of  the  network,  the  useful  bandwidth  is  about 
37  Kb  when  the  average  message  length  is  500  bits  and  about  22  Kb 
when  the  average  message  length  is  12r  bits  (in  each  of  the  above 
cases,  PFMM’s  are  counted  as  messages  with  length  sere).  If  it 
is  assumed  that  the  traffic  is  half  RFHM’s,  then  the  average  mes¬ 
sage  length  is  one  half  the  average  Host  message  length.  Table  1 
summarizes  the  average  line  and  busiest  line  data  collected  lur¬ 
ing  the  month  of  March,  for  two  assumptions  about  average  Host 
message  length. 


^Previous  Quarterly  Technical  Reoorts  were  written  under  Contract- 
Mo.  DAHC-1 5-69-C-0179 • 
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TABLE  1 


Average 

Messages/Day 


Line  Uti  1  i zatior. 

1000  bits/message  250  bits/massage 


Average  line  244,649 
Busiest  line  519,982 


3.83^  1.61% 

8.14%  3.42% 


The  average  numbers,  of  course,  do  not  give  a  true  picture 
of  possible  saturation;  for  this  one  must  also  know  the  peak-to- 
aver&ge  traffic  ratio.  During  the  past  13-month  period  we  have 
computed  an  approximation  to  this  ratio  several  times  and  found 
the  approximation  to  lie  between  three-to-one  and  four-to-one. 
(For  these  approximations  we  found,  in  order,  the  busiest  line. 
Its  busiest  day,  and  the  busiest  hour  in  that  day,  and  compared 
with  the  average  hourly  traffic  on  that  line.)  Thus  it  would 
appear  that  the  peak  traffic  that  any  line  is  handling  cannot  be 
much  greater  than  about  one-third  of  its  capacity,  and  is  more 
likely  to  oe  an.  at  10%  of  capacity. 
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2.  NETWORK  RELIABILITY 

One  of  the  major  activities  of  the  technical  staff  during 
the  first  quarter  was  a  more  direct  and  intensified  attempt  to 
improve  network  reliability.  As  described  in  Section  1.1,  the 
memory  retrofit  program  which  began  this  quarter  contributed  to 
sporadic  failures  in  several  machines  which  had  network-wide 
effects.  In  addition,  we  had  already  experienced  prolonged  hard¬ 
ware  failure  syndromes  unrelated  to  the  retrofit  program  at  several 
sites  in  the  network,  particularly  in  the  Washington,  D.C.  area. 

The  normal  procedures  of  calling  in  Honeywell  and  working  with 
Honeywell  field  engineers  had  not  cleared  up  several  of  these 
persistent  failures,  and  it  was  felt  that  an  escalation  of  BB'J  in¬ 
volvement  was  needed  to  identify  the  exact  causes  of  the  problems. 
Therefore,  during  much  of  February  and  March  there  were  one  or 
more  members  of  the  staff  at  various  sites  in  the  network  where 
hardware  problems  were  suspected.  The  first  thing  we  found  out 
was  that  the  operational  IMP  program  did  not  give  enough  diag¬ 
nostic  information  about  failures  when  they  occurred,  and  that 

i 

the  available  test  programs  did  not  detect  errors  frequently 
enough  to  justify  their  use.  That  is,  the  errors  were  apoearing 
at  rather  low  frequency,  from  once  every  few  hours  to  once  every 
few  days.  Therefore,  we  decided  to  try  to  make  the  operational 
IMP  program  run  when  if  couldj  and  report  more  information  about 
detected  hardware  errors,  rather  than  keen  the  falling  IMPs  off 
the  network  for  days  at  a  time.  At  the  same  time,  we  decided  to 
make  the  TIP  programs  more  resilient,  so  that  TIP  users  would  be 
less  affected  by  those  failures  which  did  occur. 


II 


Report  No.  2541 


Bolt  Beranek  and  Newman  Inc. 


2. 1  IMP  Program  Changes 

Modifications  to  the  IMP  program  had  two  independent  goals: 
we  wanted  to  make  the  software  less  vulnerable  to  hardware  fail¬ 
ures,  and  we  wanted  the  software  to  isolate  the  failures  and  re¬ 
port  them  to  the  NCC.  The  technique  we  chose  to  use  was  generat¬ 
ing  a  software  checksum  on  all  packets  as  they  are  sent  out  over 
a  line,  and  verifying  the  checksum  on  all  packets  received  on  a 
line.  We  suspected  that  the  hardware  failures  in  the  Washington 
area  were  happening  between  IMPs,  that  is,  the  packets  were  cor¬ 
rect  before  they  were  sent.  The  failures  could  be  in  the  transfer 
frcm  memory  to  output  modem  interface,  or  in  the  output  inter¬ 
face,  or  in  the  input  modem  interface  at  the  other  machine,  or 
in  the  transfer  from  the  input  modem  interface  to  memory.  Thus, 
a  memory-to-memory  software  checksum  should  be  able  to  detect 
these  errors.  (Note  that  this  is  in  addition  to  the  hardware 
checksum  from  output  modem  interface  to  input  modem  interface.) 

On  March  12#  a  new  version  of  the  IMP  program  was  released  with 
software  checksum  code.  It  uses  a  simple  checksum,  the  sum  of 
all  the  words  in  a  packet  minus  its  length,  on  all  inter-IMP 
transactions,  routing  messages,  data  packets,  RFNMs,  etc.  The 
reduction  in  effective  processor  bandwidth  for  the  516  IMP  is 
from  twenty  50-kilobit  lines  to  fifteen  50-kilctit  lines.  Any 
packet  found  to  have  an  incorrect  checksum  Is  discarded,  a  copy 
of  the  data  is  sent  to  the  NCC,  and  the  previous  IMP  retransmits 
the  packet. 

A  partial  list  of  the  hardware  problems  that  were  uncovered 
by  software  checksums,  and  subsequently  fixed,  includes: 

•  One  modem  interface  at  the  Aberdeen  IMP  dropped  several 
bits  frcm  several  successive  words  in  transferring  data 
into  memory . 
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•  One  modem  interface  at  the  Belvoir  IMP  picked  one  or  two 
bits  in  a  single  word  in  transferring  data  into  memory. 

•  One  modem  interface  at  the  ETAC  TIP  dropped  the  first 
word  in  transferring  data  out  of  memory. 

There  were  other  problems  that  were  not  detected  by  the  software 
checksum,  such  as  dropped  interrupts.  This  set  of  problems  may 
be  explained  by  poor  engineering  of  the  high-speed  DMC  on  3 1 6 
IMPs.  All  of  the  machines  cited  above  are  316  IMPs  with  3  modem 
interfaces,  and  they  are  the  only  such  machines  in  the  network. 

The  third  interface  is  in  a  separate  drawer  and  the  total  bus 
length  seems  to  be  too  long  for  the  driving  electronics  in  the 
original  design.  We  are  presently  investigating  various  ways  to 
fix  these  problems. 

This  first  experience  proved  the  value  of  a  software  checksum 
on  all  inter-IMP  transmissions.  *e  have  decided  to  extend  the 
enecksum  to  detect  intra-IMP  failures  as  well,  at  the  same  time 
cutting  the  cost  of  the  checksum  code  by  a  factor  of  two.  We 
can  obtain  an  end-to-end  software  checksum  on  packets,  without 
any  time  gaps,  as  follows: 

•  A  checksum  is  computed  at  the  source  TAP  for  each  packet 
as  it  is  received  from  the  source  Host. 

•  The  checksum  is  verified  at  each  intermediate  IMP  as  it 
is  received  over  tne  circuit  from  the  previous  IMP. 

•  If  the  checksum  Is  in  error,  the  packet  is  discarded, 
and  the  previous  IMP  retransmits  the  packet  when  it  does 
not  receive  an  acknowledgment . 
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•  The  previous  IMP  verifies  the  checksum  before  it- re¬ 
transmits  the  packet.  If  it  finds  an  error,  it  has  de¬ 
tected  an  intra-IM?  failure,  and  the  packet  is  lost.  If 
not,  then  the  first  transmission  was  in  error  due  to  an 
inter-IMP  failure,  and  the  previous  IMP  holds  a  good  copy 
of  the  packet. 

•  After  the  packet  has  successfully  traversed  several  inter¬ 
mediate  IMPs,  it  arrives  at  the  destination  IMP.  The 
checksum  is  verified  just  before  the  packet  is  sent  to  the 
Host . 

This  technique  provides  a  checksum  from  the  source  IMP  to  the 
destination  IMP,  plus  fault  isolation  to  a  particilar  IMP  if  there 
is  an  error.  It  is  hail  as  costly  as  the  first  Inter-IMP  checksum, 
because  only  one  checksum  calculation  is  performed  for  each  hop, 
except  when  there  is  a  retransmission.  We  intend  to  install  this 
checksum  algorithm  in  the  network  shortly. 

At  the  same  time  that  we  were  discovering  inter-IMP  failures 
with  the  software  checksum  packets,  we  began  to  notice  a  different 
kind  of  problem  with  intra-IMP  failures.  In  these  cases  we  were 
primarily  faced  with  memory  problems,  and  they  often  affected  the 
IMP  program  itself,  rather  than  the  packets  flowing  through  the 
IMP.  Our  first  attack  on  this  problem  was  to  build  a  PDP-1  pro¬ 
gram  to  verify  the  running  IMP  and  TIP  programs  at  a  site  against 
the  correct  core  images  held  at  the  PDP-1.  The  program  interro¬ 
gates  the  IMP  with  DDT  messages,  and  prints  out  a  list  of  dis¬ 
crepancies.  Using  this  program,  we  found  memory  failures  at  the 
Lincoln  IMF  and  elsewhere.  We  soon  discovered  that  even  this  step 
was  not  enough  to  guarantee  network  reliability.  The  core  verifier 
approach  assumes  that  some  bits  in  a  word  were  changed  in  a  core 
write  or  a  core  re-write  fter  a  read,  or  by  a  runaway  program. 
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But  it  also  assumes  that  the  main  body  of  the  IMP  program  is  In¬ 
tact  so  that  it  can  respond  to  DDT  commands.  We  had  two  expe¬ 
riences  at  this  time  which  illustrated  that  memory  failures  can 
be  much  more  catastrophic. 

Specifically,  there  were  two  occasions  when  the  routing  code 
in  an  IMP  was  incorrect  due  to  a  memory  failure.  Routing  messages 
are  particularly  important  for  network  reliability,  because  if  one 
IMP  is  sending  out  incorrect  routing  information  (broadcasting 
that  it  is  the  best  route  to  all  destinations,  for  instance;  the 
rest  of  the  network  will  suffer  quickly  and  on  a  very  large  scale. 
The  problems  we  had  were  due  to  single  broken  instructions  in 
the  part  of  the  IMP  program  that  builds  the  routing  message.  As 
a  result,  the  routing  messages  from  that  IMP  were  random  data, 
and  the  neighboring  IMPs  interpreted  these  messages  as  routing 
update  information.  When  this  happened,  traffic  flow  through 
the  network  was  completely  disrupted  and  no  useful  work  could  be 
done  until  the  failed  IMP  was  halted. 

This  kind  of  problem  can  happen  in  three  ways: 

•  The  routing  message  is  changed  in  transmission.  The 
inter-IMP  checksum  should  catch  this.  The  bad  routing 
messages  we  saw  in  the  network  had  good  checksums. 

•  The  routing  message  is  changed  as  it  is  constructed, 
say  by  a  memory  or  processor  failure,  or  before  it  is 
transmitted.  This  Is  what  we  termed  an  intra-IMP  failure. 

•  The  routing  program  is  incorrect  for  hardware  or  software 
reasons . 
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We  intend  to  solve  the  last  two  kinds  of  problems  by  extending 
the  concept  of  software  checksums.  The  routing  program  can  build 
a  software  checksum  for  the  routing  message  as  it  builds  the 
message,  just  as  if  it  came  from  a  Host.  Modem  output  can  then 
always  verify  the  checksum  on  routing  messages  before  transmitting 
them.  This  scheme  should  detect  all  intra-IMP  failures.  Finally, 
the  routing  code  can  calculate  a  checksum  of  its  own  instructions 
before  executing  them,  to  detect  any  changes  in  the  code.  If  a 
discrepancy  is  found,  the  program  will  request  a  reload  imme¬ 
diately.  These  changes  to  Improve  the  reliability  of  routing 
will  also  be  installed  in  the  network  shortly. 

?.2  TIP  Program  Changes 

The  hardware  difficulties  which  we  began  to  experience  during 
the  first  quarter  had  two  effects  on  Host-to-Host  communication. 
First,  the  intermittent  modem  interface  failures,  of  the  type 
seen  at  Belvoir,  Aberdeen,  and  ETAC,  meant  that  messages  were 
occasionally  lost  by  the  network.  This  loss  is  reported  to  the 
transmitting  Host  by  the  " Incomplete  Transmission"  message  gener¬ 
ated  by  the  source  IMP;  the  Host  must  then  decide  whether  to  re¬ 
transmit  or  to  take  some  other  action.  Second,  the  higher  than 
normal  incidence  of  machine  failures  meant  that  the  network  some¬ 
times  "partitioned"  so  that  there  was  no  oath  between  the  two 
communicating  Hosts.  (It  should  be  noted  that,  contrary  to  the 
original  design,  one  current  site  is  connected  to  the  network  by 
only  a  singic  path;  other  similar  connections  are  planned.  For 
any  such  sites,  any  failure  along  the  single  path  will  be  seen 
as  a  partition.)  Since  a  TIP  acts  as  a  Host  for  its  users,  its 
resilience  when  these  types  of  failures  occur  has  a  major  effect 
on  user  satisfaction. 
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Prior  to  this  quarter  the  TIP  program  "aborted"  the  user's 
connection  if  it  received  an  Incomplete  Transmission  Indication 
from  the  IMP  program.  During  the  quarter  the  TIP  program  (and 
the  programs  of  several  other  Hosts)  was  changed  to  retransmit 
messages  for  which  the  Incomplete  Transmission  indication  was 
returned.  On  the  other  hand,  it  has  not  seemed  reasonable  to 
continue  attempting  to  transmit  when  the  program  receives  a 
"Destination  Unreachable"  indication,  since  this  could  arise 
either  from  a  network  partition  or  from  a  failure  at  the  destina¬ 
tion  site.  The  interactive  user  is,  of  course,  free  to  try  again 
manually . 

A  different  situation  pertains  to  tape  transfers  involving 
TIPs  with  the  magnetic  tape  option.  In  these  cases,  the  user 
would  like  to  start  the  process  and  then  Ignore  it  until  the 
transfer  is  finished.  Network  partitions,  even  If  infrequent, 
are  too  frequent  when  tape  transfers  many  hours  in  length  are  in 
progress.  Therefore,  we  made  a  significant  modification  to  the 
TIP  magretic  tape  option  to  include  a  sequencing  mechanism  in 
the  tape  transfer  protocol  which  permits  automatic  recovery  and 
transmission  continuation  after  most  kinds  of  network  transients. 
With  this  mechanism  in  effect,  and  assuming  a  tape  is  mounted  at 
the  "other  end",  the  complete  transfer  of  a  tape  is  possible  with 
a  single  command  given  at  either  end.  If  the  connection  goes 
dead  in  mid-transfer,  the  TIP  magnetic  tape  software  will  attempt 
to  reooen  the  connection  until  successful  and  then  continue  the 
transfer  from  where  it  was  left  off.  In  addition  to  modifying 
the  TIP  magnetic  tape  option  as  sDecified  above,  we  also  modif 3d 
the  TENEh  program  which  is  able  to  communicate  with  tne  TIP 
magnetic  tape  option  so  that  it  remained  compatible. 
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3.  HSM IMP  DEVELOPMENT 

The  development  of  the  High  Speed  Modular  IMP  (HSMIMP)  con¬ 
tinued  to  be  a  major  activity  during  the  first  quarter.  The 
broad  HSMIMP  system  design  has  congealed  and  a  paper  which  de¬ 
scribes  the  system  has  beer,  prepared,  submitted  and  accepted  for 
presentation  at  the  National  Computer  Conference  in  June.  Work 
on  both  the  hardware  and  software  has  been  progressing  and,  with 
the  exception  of  certain  possible  perturbations  discussed  below, 
we  appear  to  be  pretty  well  on  schedule.  The  system  design  has 
held  up  excellently  in  general,  and  cur  conviction  that  this  is 
an  exciting  new  way  to  produce  reliable  computer  systems  of  vary¬ 
ing  size  and  pev/er  has  deepened. 

Our  working  relationship  with  Lockheed  has  gon*,.  quite  well. 
We  remain  convinced  that  the  cnoice  of  the  SUE  architecture  was 
correct  although  the  fact  that  the  machine  is  a  relatively  new 
one  has  created  a  number  of  difficulties.  We  have  had  to  press 
for  some  developments  that  needed  speeding  up  and  some  fixes  to 
bugs  or  undesirable  features  of  the  Lockheed  systems. 

With  regard  to  cur  own  hardware  development  activity,  many 
of  the  special  cards  have  been  built  and  debugged  in  prototype 
form.  These  include  a  real  time  clock,  the  pseudo-interrupt 
flag  card,  the  bus  coupler  cards,  a  modem  interface,  a  Host 
interface,  a  DMA  (memory-device  block  transfer)  card  and  some 
other  minor  cards.  The  pseudo-interrupt  card  has  been  produced 
in  final  printed  circuit  form  and  the  bus  coupler  cards  are 
presently  being  converted  to  this  form  as  well.  We  are  also  ex¬ 
perimenting  with  an  alternate  manufacturing  approach,  known  as 
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"multi-wire",  for  some  of  the  other  cards.  This  might  be  an 
economical  alternative  to  printed  circuit  multi-layer  cards  for 
small  quantities.  We  have  rented  a  semi-automatic  wire  wrap 
machine  which  has  speeded  and  cheapened  production  of  wire  wrapped 
boards. 

We  have  built  several  test  programs  and  run  then  on  a  three 
bus  system,  and  are  presently  producing  additional  1ms  couplers 
sc  that  a  four  bus  system  can  be  assembled  and  tested.  The  bus 
couplers  have  been  "esolved  into  three  card  types.  A  BCP  card 
sits  at  the  processor  end  of  couplers  to  either  memory  or  I/O 
busses.  A  BCM  card  sits  at  the  memory  and  I/O  ends  of  processor 
couplers,  as  well  as  at  the  memory  end  of  I/O  to  memory  couplers. 

A  third  card  type,  BCI,  similar  to  the  BCP  but  simpler,  lies  at 
the  I/O  bus  end  of  I/C  to  memory  couplers.  The  BCP's  and  BCM's, 
which  form  the  great  preponderance  of  the  coupler  cards  in  a 
system,  are  debugged,  are  being  built  in  small  quantities  in 
wire-wrap  form,  and  are  being  laid  out  for  printed  circuit  manu¬ 
facture.  The  BCI  is  In  prototype  test. 

Our  attention  has  been  focused  primarily  on  the  central 
system  problems  associated  with  getting  a  reasonable  sized  multi¬ 
bus  prototype  working.  We  have  therefore  deferred  working  on 
producing  more  esoteric  interfaces  such  as  high  speed  modem  and 
Host  interfaces,  satellite  modem  interfaces,  etc.  Nonetheless, 
these  units  have  received  a  good  deal  of  thought.  The  high  speed 
(1.5  megabit)  modem  interface,  for  example,  is  compatible  in  most 
ways  with  the  standard  50  kilobit  modem  interface  —  in  part  be¬ 
cause  a  number  of  sophistications  sunh  as  elastic  buffering  have 
been  made  part  of  the  standard  interface.  The  306  modem  requires 
different  drivers  and  receivers  but,  except  for  speed,  is  func¬ 
tionally  similar  to  the  303*  We  therefore  do  not  foresee  any 
particular  problems  ir«  constructing  the  faster  interface. 
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Another  area,  that  of  terminal  handling,  is  benefitting 
from  work  on  the  related  RJE  effort  discussed  in  Section  1.1  of 
cur  Quarterly  Technical  Report  No.  16.  Although  terminal  de¬ 
velopment  is  not  specifically  scheduled  for  this  year,  we  have 
nonetheless  discussed  this  matter  at  some  length  4n  order  to  be 
sure  that  our  overall  system  design  permits  incorporation  of 
terminal  handling  in  a  reasonable  way. 

We  have  begun  building  a  set  of  test  programs  for  testing 
systems  and  sub-systems  at  all  levels.  This  work  will  certainly 
continue  throughout  the  rest  of  the  year.  Inasmuch  as  we  will 
be  producing  a  number  of  the  special  cards  ourselves,  at  least 
for  the  present,  we  must  produce  test  set-ups  and  programs  to 
check  out  these  cards.  In  addition,  of  course,  flexible  pro¬ 
grams  must  be  prepared  to  test  a  wide  variety  of  system  configura¬ 
tions.  Our  test  programming  has  just  about  been  keeping  oace 
with  our  prototype  designs.  We  now  have  some  test  programs  which 
exercise  systems  of  a  few  busses  in  size. 

Now  that  the  designs  of  many  of  the  cards  are  completed  and 
their  general  characteristics  (size,  power  and  cooling  require¬ 
ments,  etc.)  better  understood,  wi.  are  taking  a  second  look  at 
a  number  of  system-wide  physical  issues  concerning  power,  cooling, 
and  mechanical  layout  and  modularity.  We  have,  until  recently, 
been  planning  to  build  our  prototype  into  two  six-foot  cabinets. 
However,  a  recent  decision  to  enlarge  the  prototype  to  test  out 
a  dual  I/O  bus  configuration,  coupled  with  the  need  for  some 
extra  working  space  to  experiment  with  varied  configurations, 
caused  the  system  to  overflow  the  second  cabinet.  In  addition, 
we  are  facing  some  problems  with  cooling  and  inter-bus  arid  inter¬ 
cab  j  net  cabling.  We  are  therefore  presently  investigating  another 
approach  which  would  package  and  cool  units  in  a  more  modular 


21 


Report  No.  2541 


Bolt  Beranek  and  Newman  Inc. 


fashion.  This  review  has  led  to  a  reconsideration  of  the  bus 
coupling  cables  as  well.  In  our  prototype  bus  couplers  we  have 
been  using  two  2-inch  wide  flat  cables  to  interconnect  the  ends 
of  bus  couplers.  We  are  currently  considering  ways  to  utilize 
fewer  and  more  manageable  cables. 

A  further  area  of  concern  has  been  the  development  of  faster 
and  more  compact  memories  for  the  SUE.  Because  of  the  system 
quantization  to  single  bis  size,  because  memories  tend  to  fill 
up  processor  and  memory  cusses,  and  because  the  memories  are  not 
quite  fast  enough  to  be  shared  locally  between  the  processors  on 
a  processor  bus,  the  system  design  could  benefit  greatly  from  a 
faster  and  physically  smaller  memory.  In  addition,  the  V  3k  of 
parity  on  the  SUE  memories,  although  they  apparently  perform 
well,  is  a  source  of  concern  to  us.  We  have  therefore  explored, 
and  are  continuing  to  explore,  the  possibility  of  obtaining  an 
IC  memory  system  through  a  number  of  vendors,  including  Lockheed. 
We  have  been  somewhat  discouraged  by  our  findings  to  date;  we  have 
been  unable  to  find  a  vendor  willing  to  supply  a  compatible  memory 
of  appropriate  size  and  speed  for  a  reasonable  price.  This  situa¬ 
tion,  however,  seems  1  .kely  to  change  as  4K  memory  chips  become 
commercially  available. 

Coding  for  the  new  machine  has  started,  and  a  DDT  has  been 
constructed.  The  conversion  of  the  IMP  algorithms  from  one  ma¬ 
chine  to  another  represents  a  large  but  straightforward  task. 

The  fact  that  the  new  machine  is  a  multiprocessor  leads  tv  some 
new  technical  concepts. 

Our  basic  control  passing  mechanism,  as  described  in  Section 
4  of  our  Quarterly  Technical  Report  iJo.  15*  is  to  break  the  pro¬ 
gram  up  into  300-microsecond  "strips",  and  to  resample  the  Pseudo- 
Interrupt  Device  (PID)  at  the  end  of  each  strip  to  determine  what 
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The  basic  function  of  the  ARPA  computer  network  is  to  allow  large 
existing  computers  (Hosts),  with  different  system  configurations, 
to  communicate  with  each  other.  Each  Host  is  connected  to  an  Interface 
Message  Processor  (IMP),  which  transmits  messages  from  its  Host(s)  to 
other  Hosts  and  accepts  messages  for  its  Host(s)  from  other  Hosts. 

There  is  frequently  no  direct  communication  circuit  between  two  Hosts 
that  wish  to  communicate;  in  these  cases  intermediate  IMPs  act  as 
message  switchers.  The  message  switching  is  performed  as  a  store  and 
forward  operation.  The  IMPs  regularly  exchange  information  which: 
allows  each  IMP  to  adapt  its  message  routing  to  the  conditions  of  its 
local  section  of  the  network;  reports  network  performance  and  mal¬ 
functions  to  a  Network  Control  Center;  permits  message  tracing  so 
that  network  operation  can  be  studied  comprehensively;  allows  network 
reconfiguration  without  reprogramming  each  IMP.  The  Terminal  IMP 
(TIP),  which  consists  of  an  IMP  and  a  Multi-Line  Controller  (MLC), 
extends  the  network  concepts  by  permitting  the  direct  attachment 
(without  an  intervening  Host)  of  up  to  64  dissimilar  terminal  devices 
to  the  network.  The  Terminal  IMP  program  provides  many  aspects  of 
the  Host  protocols  in  order  to  allow  effective  communication  between 
a  terminal  user  and  a  Host  process,  A  High  Speed  Modular  IMP  (HSMIMP) 
is  under  development;  one  goal  of  this  effort  3s  to  increase  IMP  per¬ 
formance  by  a  factor  of  10. 
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strip  to  execute  next.  At  tnis  writing,  this  control  mechanism 
seems  easy  to  code  for,  eff5oient,  and  easier  to  debug  than  the 
interrupt  philosophy  of  the  516  IMPs. 

Our  basic  locking  mechanism  is  the  read  and  clear  Instruction, 
implemented  in  the  bus  coupler.  It  too  seems  to  be  working  out 
well,  although  we  hav*  not  tried  using  it  in  a  multi-processor 
system  yet. 

Coding  of  the  basic  3tor»e/forward  path  has  progressed  to  the 
point  where  some  of  its  crucial  properties  can  be  calculated  by 
counting  instructions.  Most  of  our  original  assumptions  as  to 
instruction  counts  and  memory  references  have  been  borne  out.  So 
far,  this  node  is  read-only  from  local  memory;  neither  instructions 
nor  temporaries  need  to  be  stored  locally.  The  overall  slowdowns 
due  to  conflicts  and  to  Bus  Coupler  delay  presently  appear  to  be 
somewhat  worse  than  originally  estimated;  however  we  still  antici¬ 
pate  that  the  1^-processor  HSMIMP  will  provide  roughly  a  factor  of 
ten  increase  in  bandwidth  over  the  516  IMP. 
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